Network visualization (in R) with “netplot” and motif counting (in C++) with “barry”

SCI Seminar

George G. Vega Yon, Ph.D.

Division of Epidemiology

University of Utah

2023-04-07

Whoami

  • Research Assistant Professor of Epidemiology.

  • Ph.D. in Biostatistics from USC and M.Sc. in Economics from Caltech.

  • Methodologist working at the intersection between Statistical Computing and Complex Systems Modeling.

Network visualization with netplot

You can download the slides from
ggv.cl/slides/sci2023

netplot In a nutshell

  • What: An R package for network visualization inspired by Gephi.

  • Why: Opinionated way to visualize graphs.1

  • Where: You can get the dev version on GitHub (USCCANA/netplot) or the stable version on CRAN.

Other things to consider

In the case of ggplot2 (and thus, ggraph)

  • Patterns in R’s grid system are not directly available.1

While ggplot2 uses grid underneath it’s grammar API, these features are generally not directly available in ggplot2.
– Thomas Lin Pedersen, author of ggraph (source: tidyverse.org)

  • But the gggrid package does:

The ‘ggplot2’ package does not yet have an interface for pattern fills, but the ‘gggrid’ package (Murrell, 2022) allows us to combine raw ‘grid’ output with the ‘ggplot2’ plot.
– Paul Murrel, author of grid (source: Vectorised Pattern Fills in R Graphics)

Main features

  • Visualization engine: The grid system (same used by ggplot2.)

  • Layout algorithms: Default uses igraph’s layout.

  • Vertex sizes: Relative to the drawing area.

Example code+output

The personal friendship network of a faculty of a UK university, consisting of 81 vertices (individuals) and 817 directed and weighted connections. The school affiliation of each individual is stored as a vertex attribute. This dataset can serve as a testbed for community detection algorithms.

How?

library(netplot)
library(igraph)

library(igraphdata)
data("UKfaculty")

# Vertex colors f(group)
vcols <- V(UKfaculty)$Group 
vcols <- palette.colors(
  n = length(unique(vcols))
)[vcols]

set.seed(323)
# Netplot call
nplot(
  UKfaculty,
  edge.line.breaks = 20,
  vertex.color     = vcols
  )

How? (cont)

Things to notice:

  • Vertex size autoscaled to the device size.

  • Edged colored mixing ego and alter (source+target.)

  • Edges change colors continuously (gradient.)

  • Vertices and edges’ sizes scale as required by the user.

Structure

Graphical objects (Grobs)

List of 11
 $ .xlim        : num [1:2] -1 1
 $ .ylim        : num [1:2] -0.5 0.5
 $ .layout      : num [1:81, 1:2] 0.6661 0.0201 0.7327 0.5399 -0.4903 ...
 $ .edgelist    : num [1:817, 1:2] 57 76 12 43 28 58 7 40 5 48 ...
 $ .N           : int 81
 $ .M           : int 817
 $ name         : chr "graph.3"
 $ gp           : NULL
 $ vp           : NULL
 $ children     :List of 2
  ..$ background:List of 10
  .. ..- attr(*, "class")= chr [1:3] "rect" "grob" "gDesc"
  ..$ graph     :List of 5
  .. ..- attr(*, "class")= chr [1:3] "gTree" "grob" "gDesc"
  ..- attr(*, "class")= chr "gList"
 $ childrenOrder: chr [1:2] "background" "graph"
 - attr(*, "class")= chr [1:4] "netplot" "gTree" "grob" "gDesc"

Example 2: Playing with patterns

netplot supports advanced patterns. The figures feature radial gradients (vertices), lineal gradients, and repeated patterns (background).

Challenges and Next steps

  • Speed up the code: grid objects can be computationally expensive to build.

  • Porter Bischof (Undergrad from UVU) will contribute and present at the INSNA Sunbelt conference (flagship conference of SNA).

Counting motifs with barry

barry in a nutshell

  • What: A C++ header-only template library for motif counting (and more.)

  • Why: Implement Discrete Exponential Family Models [DEFMs] for phylogenetics and social networks analysis.

  • Where: You can get it on GitHub (USCBiostats/barry)

Main features

About 11 K lines of C++ code built for statistical modeling:

  • Motif count using change statistics (we will return to that.)

  • Full and constrained enumeration of 0/1 arrays.

  • Computes probability function for Discrete Exponential-Family Models [DEFMs].

  • Memory and computationally efficient for pooled models.

Change statistics

  • Change statistics are at the core of ERGMs (Exponential-Family Random Graph Models).

  • Two great applications:(i) make counting easy and (ii) can be used for sampling from ERGM likelihood function.

Change statistics formals

  • The change statistic is defined as a real-valued vector where the \(k\)-th entry equals the observed change when the \(ij\)-th tie is removed from the network. Formally:

    \[ \delta(y_{ij}: 0\to 1) = s(\mathbf{y})_{ij}^+ - s(\mathbf{y})_{ij}^- \]

    Where \(s(\cdot)\) is a function returning graph \(\mathbf{y}\)’s observed statistics, and \(s(\mathbf{y})_{ij}^+\) represents its value when \(y_{ij} = 1\).

Formals 2

\[\begin{equation} \mbox{logit}\left({\mathbb{P}\left(y_{ij} = 1|y_{-ij}\right) }\right) = {\theta}^\mathbf{t}\Delta\delta\left(y_{ij}:0\to 1\right), \end{equation}\]

with \(\delta\left(y_{ij}:0\to 1\right)\equiv s\left(\mathbf{y}\right)_{\mbox{ij}}^+ - s\left(\mathbf{y}\right)_{\mbox{ij}}^-\) as the vector of change statistics, in other words, the difference between the

\[\begin{equation} {\mathbb{P}\left(y_{ij} = 1|y_{-ij}\right) } = \frac{1}{1 + \mbox{exp}\left\{-{\theta}^\mathbf{t}\Delta\delta\left(y_{ij}:0\to 1\right)\right\}} \end{equation}\]

Examples of change statistics

Let’s look into the change statistics edgecount, triangles, and gender-homophily when we remove tie 33-69.

Using ergm

s() y- y+ change
Edgecount 816 817 1
Triangles 5366 5399 33
Group-homophily 664 665 1

Current implemented models

  • Exponential-Family Random Graph Models [ERGMs].

  • DEFMs for multiple correlated outcomes (Markov Random Fields; on development with Drs. MJ Pugh and Tom Valente.)

  • Motif counting applied to counting imaginary motifs in Cognitive Social Structures [CSS] (with Dr. Kyosuke Tanaka, submitted to Social Networks).

  • Modeling the evolution of gene functions in terms of transition between functional states (research grant submitted to National Human Genome Research Institute NHGRI).

ERGMs

  • A fundamental feature of pooled models (multiple graphs/arrays).
  • A single model may feature thousands of networks.
  • But if all have the same number of nodes (and other features)… we only need to enumerate once.

Final words

Today’s talk

  • The netplot R package for graph visualization.

  • barry: Your go-to motif accountant.

Other projects

fmcmc | ergmito | aphylo | netdiffuseR | ABCoptim
slurmR | barry | rgexf | rgexf

Thanks!